588. Design In-Memory File System
Hard

Design a data structure that simulates an in-memory file system.

Implement the FileSystem class:

  • FileSystem() Initializes the object of the system.
  • List<String> ls(String path)
    • If path is a file path, returns a list that only contains this file's name.
    • If path is a directory path, returns the list of file and directory names in this directory.
    The answer should in lexicographic order.
  • void mkdir(String path) Makes a new directory according to the given path. The given directory path does not exist. If the middle directories in the path do not exist, you should create them as well.
  • void addContentToFile(String filePath, String content)
    • If filePath does not exist, creates that file containing given content.
    • If filePath already exists, appends the given content to original content.
  • String readContentFromFile(String filePath) Returns the content in the file at filePath.

 

Example 1:

Input
["FileSystem", "ls", "mkdir", "addContentToFile", "ls", "readContentFromFile"]
[[], ["/"], ["/a/b/c"], ["/a/b/c/d", "hello"], ["/"], ["/a/b/c/d"]]
Output
[null, [], null, null, ["a"], "hello"]

Explanation
FileSystem fileSystem = new FileSystem();
fileSystem.ls("/");                         // return []
fileSystem.mkdir("/a/b/c");
fileSystem.addContentToFile("/a/b/c/d", "hello");
fileSystem.ls("/");                         // return ["a"]
fileSystem.readContentFromFile("/a/b/c/d"); // return "hello"

 

Constraints:

  • 1 <= path.length, filePath.length <= 100
  • path and filePath are absolute paths which begin with '/' and do not end with '/' except that the path is just "/".
  • You can assume that all directory names and file names only contain lowercase letters, and the same names will not exist in the same directory.
  • You can assume that all operations will be passed valid parameters, and users will not attempt to retrieve file content or list a directory or file that does not exist.
  • 1 <= content.length <= 50
  • At most 300 calls will be made to ls, mkdiraddContentToFile, and readContentFromFile.
Accepted
35,497
Submissions
75,663
Seen this question in a real interview before?
Companies
Related Topics
Similar Questions

Average Rating: 4.95 (60 votes)

Solution


Approach #1 Using separate Directory and File List[Accepted]

We start our discussion by looking at the directory structure used. The root directory acts as the base of the directory structure. Each directory contains two hashmaps namely dirsdirs and filesfiles. The dirsdirs contains data in the form [(subdirectory1_name:subdirectory1_structure),(subdirectory2_name:subdirectory2_structure)...][(subdirectory_1\_name: subdirectory_{1\_structure}), (subdirectory_2\_name: subdirectory_{2\_structure})...]. The filesfiles contains data in the form [(file1:file1_contents),(file2:file2_contents)...][(file_1: file_{1\_contents}), (file_2: file_{2\_contents})...]. This directory structure is shown below with a sample showing just the first two levels.

Design_Memory

Now, we'll discuss how we implement the various commands required.

  1. ls: In this case, we start off by initializing tt, a temporary directory pointer, to the root directory. We split the input directory path based on / and obtain the individual levels of directory names in a dd array. Then, we traverse over the tree directory structure based on the individual directories found and we keep on updating the tt directory pointer to point to the new level of directory(child) as we go on entering deeper into the directory structure. At the end, we will stop at either the end level directory or at the file name depending upon the input given. If the last level in the input happens to be a file name, we simply need to return the file name. So, we directly return the last entry in the dd array. If the last level entry happens to be a directory, we can obtain its subdirectory list from the list of keys in its dirsdirs hashmap. Similarly, we can obtain the list of files in the last directory from the keys in the corresponding filesfiles hashmap. We append the two lists obtained, sort them and return the sorted appended list.

  2. mkdir: In response to this command, as in case of ls, we start entering the directory structure level by level. Whenever we reach a state where a directory mentioned in the path of mkdir doesn't exist, we create a new entry in the last valid directory's dirsdirs structure and initialize its subdirectory list as an empty list. We keep on doing so till we reach the end level directory.

  3. addContentToFile: In response to this command as well, as in case of ls, we start entering the directory structure level by level. When we reach the level of the file name, we check if the file name already exists in the filesfiles keys. If it exists, we concatenate the current contents to the contents of the file(in the value section of the corresponding file). If it doesn't exist, we create a new entry in the current directory's filesfiles and initialize its contents with the current contents.

  4. readContentFromFile: As the previous cases, we reach the last directory level by traversing through the directory structure level by level. Then, in the last directory, we search for the file name entry in the corresponding filesfiles' keys and return its corresponding value as the contents of the file.

Performance Analysis

  • The time complexity of executing an ls command is O(m+n+klog(k))O\big(m+n+klog(k)\big). Here, mm refers to the length of the input string. We need to scan the input string once to split it and determine the various levels. nn refers to the depth of the last directory level in the given input for ls. This factor is taken because we need to enter nn levels of the tree structure to reach the last level. kk refers to the number of entries(files+subdirectories) in the last level directory(in the current input). We need to sort these names giving a factor of klog(k)klog(k).

  • The time complexity of executing an mkdir command is O(m+n)O(m+n). Here, mm refers to the length of the input string. We need to scan the input string once to split it and determine the various levels. nn refers to the depth of the last directory level in the mkdir input. This factor is taken because we need to enter nn levels of the tree structure to reach the last level.

  • The time complexity of both addContentToFile and readContentFromFile is O(m+n)O(m+n). Here, mm refers to the length of the input string. We need to scan the input string once to split it and determine the various levels. nn refers to the depth of the file name in the current input. This factor is taken because we need to enter nn levels of the tree structure to reach the level where the files's contents need to be added/read from.

  • The advantage of this scheme of maintaining the directory structure is that it is expandable to include even more commands easily. For example, rmdir to remove a directory given an input directory path. We need to simply reach to the destined directory level and remove the corresponding directory entry from the corresponding dirsdirs keys.

  • Renaming files/directories is also very simple, since all we need to do is to create a temporary copy of the directory structure/file with a new name and delete the last entry.

  • Relocating a hierarchichal subdirectory structure from one directory to the other is also very easy, since, all we need to do is obtain the address for the corresponding subdirectory class, and assign the same at the new positon in the new directory structure.

  • Extracting only directories or files list on any path is easy in this case, since we maintain separate entires for dirsdirs and filesfiles.


Approach #2 Using unified Directory and File List[Accepted]

This design differs from the first design in that the current data structure for a Directory contains a unified filesfiles hashmap, which contains the list of all the files and subdirectories in the current directory. Apart from this, we contain an entry isfileisfile, which when True indicates that the current filesfiles entry is actually corresponding to a file, otherwise it represents a directory. Further, since we are considering the directory and files' entries in the same manner, we need an entry for contentcontent, which contains the contents of the current file(if isfileisfile entry is True in the current case). For entries corresponding to directories, the contentcontent field is kept empty.

The following figure shows the directory structure for the same example as in the case above, for the first two levels of the hierarchical structure.

Design_In_Memory

The implementation of all the commands remains the same as in the last design, except that we need to make entries in the same filesfiles hashmap for both files and directories, corresponding to addContentToFile and mkdir respectively. Further, for ls, we need not extract entries separately for the files and directories, since they are unified in the current case, and can be obtained in a single go.

This approach is inspired by @shawngao

Performance Analysis

  • The time complexity of executing an ls command is O(m+n+klog(k))O\big(m+n+klog(k)\big). Here, mm refers to the length of the input string. We need to scan the input string once to split it and determine the various levels. nn refers to the depth of the last directory level in the given input for ls. This factor is taken because we need to enter nn levels of the tree structure to reach the last level. kk refers to the number of entries(files+subdirectories) in the last level directory(in the current input). We need to sort these names giving a factor of klog(k)klog(k).

  • The time complexity of executing an mkdir command is O(m+n)O(m+n). Here, mm refers to the length of the input string. We need to scan the input string once to split it and determine the various levels. nn refers to the depth of the last directory level in the mkdir input. This factor is taken because we need to enter nn levels of the tree structure to reach the last level.

  • The time complexity of both addContentToFile and readContentFromFile is O(m+n)O(m+n). Here, mm refers to the length of the input string. We need to scan the input string once to split it and determine the various levels. nn refers to the depth of the file name in the current input. This factor is taken because we need to enter nn levels of the tree structure to reach the level where the files's contents need to be added/read from.

  • The advantage of this scheme of maintaining the directory structure is that it is expandable to include even more commands easily. For example, rmdir to remove a directory given an input directory path. We need to simply reach to the destined directory level and remove the corresponding directory entry from the corresponding dirsdirs keys.

  • Renaming files/directories is also very simple, since all we need to do is to create a temporary copy of the directory structure/file with a new name and delete the last entry.

  • Relocating a hierarchichal subdirectory structure from one directory to the other is also very easy, since, all we need to do is obtain the address for the corresponding subdirectory class, and assign the same at the new positon in the new directory structure.

  • If the number of directories is very large, we waste redundant space for isfileisfile and contentcontent, which wasn't needed in the first design.

  • A problem with the current design could occur if we want to list only the directories(and not the files), on any given path. In this case, we need to traverse over the whole contents of the current directory, check for each entry, whether it is a file or a directory, and then extract the required data.

Report Article Issue

Comments: 19
chris8585's avatar
Read More

TreeMap can be used instead of HashMap to maintain lexicographic ordering

35
Show 4 replies
Reply
Share
Report
yuxiong's avatar
Read More

@vinod23 Thanks for sharing. It's a very detailed and clear article.
However, there is one thing I'd like to discuss. In the performance analysis, I think the 'n' part could be removed. I want to claim that n = O(m), so O(m+n) = O(m). This can be easily proven as following, if the length of the path string is m, then the max levels of hierarchies (intermediate directories) it can contain is at most m/2, where each directory is a single character. Thus, the levels we must enter is n = m/2 = O(m).

15
Show 1 reply
Reply
Share
Report
yan77's avatar
Read More

The variable naming of the first solution is really bad.

8
Reply
Share
Report
mission_2020's avatar
Read More

I dont think its a hard level. Frankly its not complex. One may not come up with a solution, but complexity level of the solution is not hard.

7
Show 1 reply
Reply
Share
Report
sisli's avatar
Read More

Why start at int i = 1?

2
Show 2 replies
Reply
Share
Report
fkie4's avatar
Read More

who the hell can finish this in an hour interview?

3
Show 2 replies
Reply
Share
Report
LeandroIvan's avatar
Read More

The test 59 of 63 gives the wrong result, even if entering the test case manually it actually pass.

["FileSystem","ls","mkdir","addContentToFile","ls","readContentFromFile","addContentToFile","readContentFromFile","addContentToFile","ls","readContentFromFile"]
[[],["/"],["/a/b/c"],["/a/b/c/d","hello world"],["/"],["/a/b/c/d"],["/a/b/c/d"," hello hello world"],["/a/b/c/d"],["/a/b/c/dddd","winstons"],["/a/b/c"],["/a/b/c/dddd"]]

The right output should be :
[null,[],null,null,["a"],"hello world",null,"hello world hello hello world",null,["d","dddd"],"winstons"]

Instead the test says it expect:
[null,[],null,null,["a"],"hello world",null,"hello world hello hello world",null,["d","dddd"],""]

Which is actually wrong.

1
Reply
Share
Report
brian37's avatar
Read More

the first diagram is wrong which dir 'a' and 'b' are parent and sub relationship rather than on the same level.

0
Reply
Share
Report
shaan3's avatar
Read More

Great article. Cleared a lot of doubts.

0
Reply
Share
Report
acharya2021's avatar
Read More

I have a question about the klogk part of the time complexity (O(m+n+klog(k))) for the ls command. Quoting the solution, "k refers to the number of entries(files+subdirectories) in the last level directory(in the current input). We need to sort these names giving a factor of klog(k)." I agree that since there are k entries, there are klogk comparisons. But each of those entries is a string, say, of length s. Shouldn't we, therefore, factor the O(s) time it takes to compare each string? If yes, the time complexity would be O(m+n+ s * k(logk))

0
Reply
Share
Report

You don't have any submissions yet.

/23
Contribute
|||
Saved
Trie
This list is empty.